Sparsesegmentsum ================= 对稀疏分段数据进行求和。该算子根据分段ID(segment_ids)将输入数据的分段进行求和,输出除第0维外其他维度与输入相同的张量。 算法流程: 1. 计算输入数据的总元素数和每个切片的大小(按第0维切分) 2. 计算输出数据的形状:第0维为 `max(segment_ids) + 1`,其他维度与输入相同 3. 初始化输出数据为0 4. 对每个索引,将其对应的数据切片累加到对应的分段中 对于每个索引 `i`: - 获取对应的分段ID:`segment_id = in_segment_ids[i]` - 获取对应的输入索引:`input_index = in_indices[i]` - 将输入数据的切片 `in_data[input_index * n : (input_index + 1) * n]` 累加到输出数据的切片 `out_data[segment_id * n : (segment_id + 1) * n]` .. math:: \text{out\_data\_shape}[0] = \max(\text{in\_segment\_ids}) + 1 .. math:: \text{out\_data\_shape}[i] = \text{in\_data\_shape}[i], \quad \text{for } i = 1, 2, \ldots, \text{in\_data\_shape\_size} - 1 .. math:: n = \frac{\prod_{i=0}^{\text{in\_data\_shape\_size}-1} \text{in\_data\_shape}[i]}{\text{in\_data\_shape}[0]} .. math:: \text{out\_data}[j + \text{segment\_id} \times n] \mathrel{+}= \text{in\_data}[j + \text{input\_index} \times n], \quad \text{for } j = 0, 1, \ldots, n-1 其中 `n` 是每个切片的大小(按第0维切分后的元素数)。 输入: - **in_data** - 输入数据数组,形状由 `in_data_shape` 和 `in_data_shape_size` 确定。 - **in_indices** - 输入索引数组,大小为 `in_indices_size`,每个元素表示输入数据第0维的索引。 - **in_segment_ids** - 分段ID数组,大小为 `in_indices_size`,与 `in_indices` 一一对应,表示每个索引所属的分段。 - **in_data_shape** - 输入数据的形状数组,大小为 `in_data_shape_size`。 - **in_data_shape_size** - 输入数据的维度数。 - **in_indices_size** - `in_indices` 和 `in_segment_ids` 数组的大小。 输出: - **out_data** - 输出数据数组,第0维大小为 `max(in_segment_ids) + 1`,其他维度与输入相同。 - **out_data_shape** - 输出数据的形状数组,大小为 `in_data_shape_size`,由算子内部计算。 支持平台: ``FT78NE`` ``MT7004`` .. note:: - FT78NE 支持fp32, int16, int32, fp64, cplx64, cplx128 - MT7004 支持fp16, fp32, int16, int32, cplx64 **共享存储版本:** .. c:function:: void fp_sparsesegmentsum_s(float* in_data, float* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape, int core_mask) .. c:function:: void hp_sparsesegmentsum_s(half* in_data, half* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape, int core_mask) .. c:function:: void i16_sparsesegmentsum_s(int16_t* in_data, int16_t* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape, int core_mask) .. c:function:: void i32_sparsesegmentsum_s(int32_t* in_data, int32_t* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape, int core_mask) .. c:function:: void dp_sparsesegmentsum_s(double* in_data, double* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape, int core_mask) .. c:function:: void c64_sparsesegmentsum_s(float* in_data, float* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape, int core_mask) .. c:function:: void c128_sparsesegmentsum_s(double* in_data, double* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape, int core_mask) **C调用示例:** .. code-block:: c :linenos: :emphasize-lines: 31-33 //FT78NE示例 #include #include int main(int argc, char* argv[]) { // 假设在DDR空间 // 输入数据形状 [4, 3, 2],表示4个样本,每个样本3×2的矩阵 int in_data_shape[] = {4, 3, 2}; int in_data_shape_size = 3; // 输入数据:4个样本的数据 float *in_data = (float *)0xA0000000; // in_data包含 4 * 3 * 2 = 24 个元素 // 索引数组:选择第0、2、3个样本 int in_indices[] = {0, 2, 3}; int in_indices_size = 3; // 分段ID:第0个样本属于分段0,第2个样本属于分段1,第3个样本属于分段1 int in_segment_ids[] = {0, 1, 1}; // 输出数据形状(待计算) int out_data_shape[3]; // 输出数据:分段0有1个样本,分段1有2个样本 // 输出形状应该是 [2, 3, 2](max(segment_ids)+1=2) float *out_data = (float *)0xB0000000; int core_mask = 0xff; fp_sparsesegmentsum_s(in_data, out_data, in_indices, in_segment_ids, in_data_shape, in_data_shape_size, in_indices_size, out_data_shape, core_mask); return 0; } **私有存储版本:** .. c:function:: void fp_sparsesegmentsum_p(float* in_data, float* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape) .. c:function:: void hp_sparsesegmentsum_p(half* in_data, half* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape) .. c:function:: void i16_sparsesegmentsum_p(int16_t* in_data, int16_t* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape) .. c:function:: void i32_sparsesegmentsum_p(int32_t* in_data, int32_t* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape) .. c:function:: void dp_sparsesegmentsum_p(double* in_data, double* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape) .. c:function:: void c64_sparsesegmentsum_p(float* in_data, float* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape) .. c:function:: void c128_sparsesegmentsum_p(double* in_data, double* out_data, int* in_indices, int* in_segment_ids, int* in_data_shape, int in_data_shape_size, int in_indices_size, int* out_data_shape) **C调用示例:** .. code-block:: c :linenos: :emphasize-lines: 29-31 //FT78NE示例 #include #include int main(int argc, char* argv[]) { // 假设在L2空间 // 输入数据形状 [4, 3, 2],表示4个样本,每个样本3×2的矩阵 int in_data_shape[] = {4, 3, 2}; int in_data_shape_size = 3; // 输入数据:4个样本的数据 float *in_data = (float *)0x10000000; // in_data包含 4 * 3 * 2 = 24 个元素 // 索引数组:选择第0、2、3个样本 int in_indices[] = {0, 2, 3}; int in_indices_size = 3; // 分段ID:第0个样本属于分段0,第2个样本属于分段1,第3个样本属于分段1 int in_segment_ids[] = {0, 1, 1}; // 输出数据形状(待计算) int out_data_shape[3]; // 输出数据:分段0有1个样本,分段1有2个样本 // 输出形状应该是 [2, 3, 2](max(segment_ids)+1=2) float *out_data = (float *)0x10010000; fp_sparsesegmentsum_p(in_data, out_data, in_indices, in_segment_ids, in_data_shape, in_data_shape_size, in_indices_size, out_data_shape); return 0; }